Bayesian Hierarchical Cross-Clustering

نویسندگان

  • Dazhuo Li
  • Patrick Shafto
چکیده

Most clustering algorithms assume that all dimensions of the data can be described by a single structure. Cross-clustering (or multiview clustering) allows multiple structures, each applying to a subset of the dimensions. We present a novel approach to crossclustering, based on approximating the solution to a Cross Dirichlet Process mixture (CDPM) model [Shafto et al., 2006, Mansinghka et al., 2009]. Our bottom-up, deterministic approach results in a hierarchical clustering of dimensions, and at each node, a hierarchical clustering of data points. We also present a randomized approximation, based on a truncated hierarchy, that scales linearly in the number of levels. Results on synthetic and real-world data sets demonstrate that the cross-clustering based algorithms perform as well or better than the clustering based algorithms, our deterministic approaches models perform as well as the MCMC-based CDPM, and the randomized approximation provides a remarkable speedup relative to the full deterministic approximation with minimal cost in predictive error.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hierarchical Distance-dependent Bayesian Model for Event Coreference Resolution

We present a novel hierarchical distancedependent Bayesian model for event coreference resolution. While existing generative models for event coreference resolution are completely unsupervised, our model allows for the incorporation of pairwise distances between event mentions — information that is widely used in supervised coreference models to guide the generative clustering processing for be...

متن کامل

Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges.

We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint...

متن کامل

Bayesian Hierarchical Modelling for Tailoring Metric Thresholds

Software is highly contextual. While there are cross-cutting ‘global’ lessons, individual software projects exhibit many ‘local’ properties. This data heterogeneity makes drawing local conclusions from global data dangerous. A key research challenge is to construct locally accurate prediction models that are informed by global characteristics and data volumes. Previous work has tackled this pro...

متن کامل

Randomized Algorithms for Fast Bayesian Hierarchical Clustering

We present two new algorithms for fast Bayesian Hierarchical Clustering on large data sets. Bayesian Hierarchical Clustering (BHC) [1] is a method for agglomerative hierarchical clustering based on evaluating marginal likelihoods of a probabilistic model. BHC has several advantages over traditional distancebased agglomerative clustering algorithms. It defines a probabilistic model of the data a...

متن کامل

Agglomerative Clustering of Bagged Data Using Joint Distributions

Current methods for hierarchical clustering of data either operate on features of the data or make limiting model assumptions. We present the hierarchy discovery algorithm (HDA), a model-based hierarchical clustering method based on explicit comparison of joint distributions via Bayesian network learning for predefined groups of data. HDA works on both continuous and discrete data and offers a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011